Disclaimer: I look down on technical topics like “What is the principle behind Go’s GMP scheduler?”
I’m usually not interested in studying this type of question. But it’s been asked so frequently in interviews that I’m now setting aside 2 hours to understand it. On one hand, I want to see if there’s any real technical depth behind it; on the other hand, I’m writing down the answer here. But I won’t let this information stay in my brain—next time I’m asked in an interview, I’ll still say I don’t know 😏
GMP is an acronym:
go
statement adds a new G.GOMAXPROCS
, typically the number of CPU cores.GMP means: spin up several M (threads) to execute G (goroutines), but at most P (cores) M’s can run in parallel.
Here come some boring (and simplified) rules:
These may sound confusing now, but they’ll make more sense with the code examples later.
Here’s a basic code file to demonstrate spawning a goroutine:
package main
import (
"fmt"
"sync"
)
func main() {
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
fmt.Println("Hello from goroutine")
}()
wg.Wait()
}
Run it with debug flags:
go build demo0.go
GODEBUG='schedtrace=200,scheddetail=1' ./demo0
Don’t use go run
—it introduces extra runtime logs. The binary version’s logs are cleaner:
SCHED 0ms: gomaxprocs=10 idleprocs=7 threads=5 spinningthreads=1 needspinning=0 idlethreads=0 runqueue=0 gcwaiting=false nmidlelocked=1 stopwait=0 sysmonwait=false
P0: status=0 schedtick=0 syscalltick=0 m=nil runqsize=0 gfreecnt=0 timerslen=0
P1: status=1 schedtick=0 syscalltick=0 m=2 runqsize=0 gfreecnt=0 timerslen=0
...
Hello from goroutine
These logs show:
SCHED
is a summary—10 P’s were launched (gomaxprocs=10
).P1
is running, held by M2
.P0
is held by M3
and is in spinning
state (waiting for tasks).You don’t see the print-related G because it finishes too quickly—GMP details can be viewed via debug logs like this.
package main
import (
"fmt"
"runtime"
"time"
)
func busy(tag string, d time.Duration) {
end := time.Now().Add(d)
x := 0
for time.Now().Before(end) {
x++
}
fmt.Println(tag, "done", x)
}
func main() {
runtime.GOMAXPROCS(1)
go busy("A", 1500*time.Millisecond)
busy("B", 1500*time.Millisecond)
}
Sometimes “A” prints first, sometimes “B”.
We set GOMAXPROCS(1)
, so only one P exists. Yet, Go’s GMP scheduler preempts every 10ms, meaning even if go busy("A")
is running, it eventually yields, letting the main thread run B
.
To visualize this more clearly:
func busy(tag string, d time.Duration) {
end := time.Now().Add(d)
next := time.Now()
for time.Now().Before(end) {
if time.Now().After(next) {
fmt.Print(tag, " ") // Print ~every 100ms
next = time.Now().Add(100 * time.Millisecond)
}
}
fmt.Println(tag, "done")
}
Sample output: B A B A B A A B A B A B A B A B A B A B A B A B B A B A B A B done
This confirms that A
and B
alternate—they’re interleaved due to GMP’s preemptive scheduler.
package main
import (
"runtime"
"sync"
"time"
)
func spin(d time.Duration) {
deadline := time.Now().Add(d)
for time.Now().Before(deadline) {
}
}
func main() {
runtime.GOMAXPROCS(1)
const N = 120
var wg sync.WaitGroup
wg.Add(N)
for i := 0; i < N; i++ {
go func() { defer wg.Done(); spin(500 * time.Millisecond) }()
}
time.Sleep(30 * time.Millisecond)
runtime.GOMAXPROCS(4)
wg.Wait()
}
Initially all G’s go to one P. Then GOMAXPROCS
is increased to 4—new P’s (P1–P3) “steal” tasks from P0’s queue.
Debug logs show:
P0: runqsize=17
P1: runqsize=5
P2: runqsize=5
P3: runqsize=17
Initially, all G’s were with P0. Later, others came and took some.
package main
import (
"runtime"
"sync"
"time"
)
func spin(d time.Duration) {
end := time.Now().Add(d)
for time.Now().Before(end) {
}
}
func main() {
runtime.GOMAXPROCS(1)
const N = 600
var wg sync.WaitGroup
wg.Add(N)
for i := 0; i < N; i++ {
go func() { defer wg.Done(); spin(800 * time.Millisecond) }()
}
time.Sleep(500 * time.Millisecond)
runtime.GOMAXPROCS(4)
wg.Wait()
}
Use debug to analyze:
go build demo4.go
GODEBUG='schedtrace=200,scheddetail=1' ./demo4 &> demo4.log
Initial logs show:
runqueue=0 // global queue
P0: runqsize=0
P1: runqsize=0
Later, P0 starts many G’s:
P0: runqsize=204
runqueue=395 // global queue overflow
Default runq
capacity is 256. Excess tasks go to the global queue. When P1–P3 enter, they pick tasks from the global queue first—not by stealing.
If a P finds no work in its runq or the global queue, it checks netpoll (OS), and if that fails, it spins.
package main
import (
"fmt"
"runtime"
"time"
)
func main() {
runtime.GOMAXPROCS(2)
go func() {
time.Sleep(2 * time.Second)
fmt.Println("blocking done")
}()
go func() {
for i := 0; i < 6; i++ {
time.Sleep(300 * time.Millisecond)
fmt.Println("still running", i)
}
}()
time.Sleep(3 * time.Second)
}
Output:
still running 0
still running 1
...
blocking done
Even though the first G blocks, the second keeps running—GMP properly yields P when blocking occurs.
package main
import (
"fmt"
"runtime"
"time"
)
func spin() {
for { }
}
func main() {
runtime.GOMAXPROCS(1)
go spin()
time.Sleep(100 * time.Millisecond)
fmt.Println("I should still print unless preemption is off")
}
Two ways to run:
go build demo7.go
GODEBUG='schedtrace=1000,scheddetail=1' ./demo7
and:
go build demo7.go
GODEBUG='schedtrace=1000,scheddetail=1,asyncpreemptoff=1' ./demo7
With asyncpreemptoff=1
, async preemption is disabled—the print never happens due to infinite loop. This illustrates GMP’s ability to yield CPU, and what happens when it’s turned off.
I didn’t dig deeply into the source. For example, constants for G/M/P are in src/runtime/runtime2.go
:
And runqputslow
in src/runtime/proc.go
handles queue overflow:
This post is probably incomplete—I’m not going any deeper. Some people may love diving into this.
GMP is an engineering implementation of a coroutine scheduler. Many care about the engineering details—task queue management, preemption, yielding, etc. But underneath, all coroutine schedulers are based on continuation. Go just happens to spotlight coroutines. Other languages can implement their own coroutine schedulers too.
So here’s the question: if you love researching GMP, have you studied coroutine/virtual thread/async function/process implementations in other languages? How do they differ from Go’s goroutines?
If my job someday requires a deep understanding of these, I’ll learn them.
I once said:
Go designed automatic garbage collection to reduce the mental load of memory management. Yet some interviewers go out of their way to master GC internals and quiz candidates on it. If you truly believe in using your brain to manage memory, why not just use Rust?
It’s like I’m learning to drive, and someone expects me to understand how internal combustion works before getting a license. I’m not building cars—I’m driving one.
Same goes for Go’s go
keyword. It was built to make concurrency easy. Yet some folks study its internals obsessively, and use that as a gauge of Go proficiency. Doesn’t that contradict the language’s intent? If a language needs you to understand scheduling internals to write good code, it has failed in its abstraction.
If you’re building programming languages or implementing schedulers, by all means, understand GMP thoroughly. Otherwise?